Weakly-Supervised Video Scene Co-parsing
نویسندگان
چکیده
In this paper, we propose a scene co-parsing framework to assign pixel-wise semantic labels in weakly-labeled videos, i.e., only videolevel category labels are given. To exploit rich semantic information, we first collect all videos that share the same video-level labels and segment them into supervoxels. We then select representative supervoxels for each category via a supervoxel ranking process. This ranking problem is formulated with a submodular objective function and a scene-object classifier is incorporated to distinguish scenes and objects. To assign each supervoxel a semantic label, we match each supervoxel to these selected representatives in the feature domain. Each supervoxel is then associated with a series of category potentials and assigned to a semantic label with the maximum one. The proposed co-parsing framework extends scene parsing from single images to videos and exploits mutual information among a video collection. Experimental results on the Wild-8 and SUNY-24 datasets show that the proposed algorithm performs favorably against the state-of-the-art approaches.
منابع مشابه
Movie/Script: Alignment and Parsing of Video and Text Transcription
Movies and TV are a rich source of diverse and complex video of people, objects, actions and locales “in the wild”. Harvesting automatically labeled sequences of actions from video would enable creation of large-scale and highlyvaried datasets. To enable such collection, we focus on the task of recovering scene structure in movies and TV series for object tracking and action retrieval. We prese...
متن کاملScene Parsing by Weakly Supervised Learning with Image Descriptions
This paper investigates a fundamental problem of scene understanding: how to parse a scene image into a structured configuration (i.e., a semantic object hierarchy with object interaction relations). We propose a deep architecture consisting of two networks: i) a convolutional neural network (CNN) extracting the image representation for pixel-wise object labeling and ii) a recursive neural netw...
متن کاملWeakly supervised training for parsing Mandarin broadcast transcripts
We present a systematic investigation of applying weakly supervised co-training approaches to improve parsing performance for parsing Mandarin broadcast news (BN) and broadcast conversation (BC) transcripts, by iteratively retraining two competitive Chinese parsers from a small set of treebanked data and a large set of unlabeled data. We compare co-training to self-training, and our results sho...
متن کاملSemantic Graph Construction for Weakly-Supervised Image Parsing
We investigate weakly-supervised image parsing, i.e., assigning class labels to image regions by using imagelevel labels only. Existing studies pay main attention to the formulation of the weakly-supervised learning problem, i.e., how to propagate class labels from images to regions given an affinity graph of regions. Notably, however, the affinity graph of regions, which is generally construct...
متن کاملWeakly-Supervised Violence Detection in Movies with Audio and Video Based Co-training
In this work, we present a novel method to detect violent shots in movies. The detection process is split into two views–––the audio and video views. From the audio-view, a weakly-supervised method is exploited to improve the classification performance. And from the video-view, we use a classifier to detect violent shots. Finally, the auditory and visual classifiers are combined in a co-trainin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016